Rank | Count | Beginning |
---|---|---|
7277 | 1066 | The |
8713 | 310 | This |
6770 | 257 | Subcommittee |
3461 | 252 | In |
1773 | 227 | Dept. |
3949 | 221 | It |
1322 | 220 | Committee |
85 | 149 | A |
2885 | 137 | He |
3257 | 126 | I |
5045 | 120 | NET |
9471 | 113 | We |
5288 | 103 | Office |
8118 | 97 | There |
974 | 84 | Bureau |
8303 | 82 | These |
569 | 81 | As |
4944 | 77 | National |
6621 | 75 | State |
8620 | 75 | They |
33 | 70 | • |
2572 | 68 | For |
3872 | 65 | ISSN |
6142 | 61 | S.) |
3151 | 60 | However, |
9100 | 54 | To |
3331 | 53 | If |
9478 | 52 | “We |
701 | 48 | At |
2076 | 48 | Division |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV